Categories

Versions

You are viewing the RapidMiner Studio documentation for version 10.1 - Check here for latest version

Read Kafka Topic (Kafka Connector)

Synopsis

This operator reads the messages from kafka topic on a specific Kafka cluster.

Description

It can either retrieve all previous messages available on this topic, or can collect new incoming messages. New messages are either collected for a specified amount of time or until a specific number of messages are retrieved.

Input

  • connection (Connection)

    The connection to the Kafka server, from where the messages are read.

Output

  • out (Data Table)

    The ExampleSet with the collected messages.

Parameters

  • kafka_topic

    The name of the Kafka topic which should be read.

    Range:
  • update_topics

    Try to a retrieve list of available topics from server.

    Range:
  • offset_strategy

    The polling strategy for the topic.

    • earliest: Messages are retrieved beginning the earliest available messages
    • latest: Only new incoming messages are collected
    Range:
  • retrieval_time_out

    Time out when retrieving old messages. Typically relatively short, unless retrieving millions of records. Only applicable if the offset strategy is set to earliest.

    Range:
  • collection_strategy

    The strategy to collect new messages. It's either by duration, meaning the operator will wait and collect all new messages incoming in the next n seconds or number, meaning it waits until n messages are retrieved.

    • duration: The operator will wait and collect all new messages incoming in the next n seconds
    • number: The operator will wait until n messages are retrieved
    Range:
  • counter

    Counter for the collection strategy. It's either the duration in seconds the operator to wait or the number of messages to collect.

    Range:
  • time_out

    If the collection strategy is number this is an additional time out, to prevent the operator waiting too long until enough messages are retrieved, for example in case the message producer is inactive.

    Range:
  • polling_time_out The time out for each individual poll to the kafka cluster. Increase this value if the connection has a high latency and you experience lost messages. Range:

Tutorial Processes

Train and Apply Clustering on data from Kafka Topic

In this tutorial process the usage of the Read Kafka Topic operator is demonstrated.